{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 12.2 Advanced GroupBy Use（高级GroupBy用法）\n",
    "\n",
    "我们已经在第十章讨论了groupby的一些用法，这里还有一些技巧可能会用得到。\n",
    "\n",
    "# 1 Group Transforms and “Unwrapped” GroupBys（组变换和无包装的GroupBy）\n",
    "\n",
    "在第十章里，使用apply方法在组上进行转换操作的。还有一个内建的方法叫transform，和apply相同，但是在一些函数的用法上有一些限制：\n",
    "\n",
    "- 可以产生一个标量，将数据广播（broadcast）到与组一样的形状（这里的broadcast可以理解为改变数据形状的方法，感兴趣的可以直接搜索 numpy broadcast）\n",
    "- 可以产生一个和输入的组一样形状的对象\n",
    "- 不能对输入进行改变\n",
    "\n",
    "举个例子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>a</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>c</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>a</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>b</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>a</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>b</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>c</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   key  value\n",
       "0    a    0.0\n",
       "1    b    1.0\n",
       "2    c    2.0\n",
       "3    a    3.0\n",
       "4    b    4.0\n",
       "5    c    5.0\n",
       "6    a    6.0\n",
       "7    b    7.0\n",
       "8    c    8.0\n",
       "9    a    9.0\n",
       "10   b   10.0\n",
       "11   c   11.0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'key': ['a', 'b', 'c'] * 4,\n",
    "                   'value': np.arange(12.)})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过key来计算组的平均值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key\n",
       "a    4.5\n",
       "b    5.5\n",
       "c    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = df.groupby('key').value\n",
    "g.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们想要产生一个和df['value']一样大小的Series，不过要用key分组后的平均值来替换。我们可以把函数lambda x: x.mean()给transform："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.5\n",
       "1     5.5\n",
       "2     6.5\n",
       "3     4.5\n",
       "4     5.5\n",
       "5     6.5\n",
       "6     4.5\n",
       "7     5.5\n",
       "8     6.5\n",
       "9     4.5\n",
       "10    5.5\n",
       "11    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(lambda x: x.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于内建的聚合函数，我们可以传入一个字符串别名，就像使用groupby agg方法的时候一样："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.5\n",
       "1     5.5\n",
       "2     6.5\n",
       "3     4.5\n",
       "4     5.5\n",
       "5     6.5\n",
       "6     4.5\n",
       "7     5.5\n",
       "8     6.5\n",
       "9     4.5\n",
       "10    5.5\n",
       "11    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform('mean')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "就像apply，transform能用那些返回Series的函数，但是结果的大小和输入的必须一样。例如，我们通过一个lambda函数令每个小组都乘2："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.0\n",
       "1      2.0\n",
       "2      4.0\n",
       "3      6.0\n",
       "4      8.0\n",
       "5     10.0\n",
       "6     12.0\n",
       "7     14.0\n",
       "8     16.0\n",
       "9     18.0\n",
       "10    20.0\n",
       "11    22.0\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(lambda x: x * 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个更复杂的例子，我们可以按降序来计算每一个组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.0\n",
       "1     4.0\n",
       "2     4.0\n",
       "3     3.0\n",
       "4     3.0\n",
       "5     3.0\n",
       "6     2.0\n",
       "7     2.0\n",
       "8     2.0\n",
       "9     1.0\n",
       "10    1.0\n",
       "11    1.0\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(lambda x: x.rank(ascending=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "考虑一个包含简单聚合的分组转换函数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def normalize(x):\n",
    "    return (x - x.mean()) / x.std()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用transform或apply，都能得到一样的结果："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -1.161895\n",
       "1    -1.161895\n",
       "2    -1.161895\n",
       "3    -0.387298\n",
       "4    -0.387298\n",
       "5    -0.387298\n",
       "6     0.387298\n",
       "7     0.387298\n",
       "8     0.387298\n",
       "9     1.161895\n",
       "10    1.161895\n",
       "11    1.161895\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform(normalize)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -1.161895\n",
       "1    -1.161895\n",
       "2    -1.161895\n",
       "3    -0.387298\n",
       "4    -0.387298\n",
       "5    -0.387298\n",
       "6     0.387298\n",
       "7     0.387298\n",
       "8     0.387298\n",
       "9     1.161895\n",
       "10    1.161895\n",
       "11    1.161895\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.apply(normalize)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "内建的聚合函数，比如mean, sum经常比一般的apply函数要快。而是用transform的话，会更快一些。这就需要我们使用无包装的组操作（upwrapped group operation）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4.5\n",
       "1     5.5\n",
       "2     6.5\n",
       "3     4.5\n",
       "4     5.5\n",
       "5     6.5\n",
       "6     4.5\n",
       "7     5.5\n",
       "8     6.5\n",
       "9     4.5\n",
       "10    5.5\n",
       "11    6.5\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g.transform('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -1.161895\n",
       "1    -1.161895\n",
       "2    -1.161895\n",
       "3    -0.387298\n",
       "4    -0.387298\n",
       "5    -0.387298\n",
       "6     0.387298\n",
       "7     0.387298\n",
       "8     0.387298\n",
       "9     1.161895\n",
       "10    1.161895\n",
       "11    1.161895\n",
       "Name: value, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normalized = (df['value'] - g.transform('mean')) / g.transform('std')\n",
    "normalized"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个无包装的组操作可能会涉及多个组聚合操作，不过向量化操作会胜过这种操作。\n",
    "\n",
    "# 2 Grouped Time Resampling（分组时间重采样）\n",
    "\n",
    "对于时间序列数据，resample方法是一个基于时间的组操作。这里有一个样本表格： "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-05-20 00:01:00</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-05-20 00:02:00</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017-05-20 00:03:00</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017-05-20 00:04:00</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2017-05-20 00:05:00</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2017-05-20 00:06:00</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2017-05-20 00:07:00</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2017-05-20 00:08:00</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2017-05-20 00:09:00</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2017-05-20 00:10:00</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2017-05-20 00:11:00</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2017-05-20 00:12:00</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2017-05-20 00:13:00</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2017-05-20 00:14:00</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  time  value\n",
       "0  2017-05-20 00:00:00      0\n",
       "1  2017-05-20 00:01:00      1\n",
       "2  2017-05-20 00:02:00      2\n",
       "3  2017-05-20 00:03:00      3\n",
       "4  2017-05-20 00:04:00      4\n",
       "5  2017-05-20 00:05:00      5\n",
       "6  2017-05-20 00:06:00      6\n",
       "7  2017-05-20 00:07:00      7\n",
       "8  2017-05-20 00:08:00      8\n",
       "9  2017-05-20 00:09:00      9\n",
       "10 2017-05-20 00:10:00     10\n",
       "11 2017-05-20 00:11:00     11\n",
       "12 2017-05-20 00:12:00     12\n",
       "13 2017-05-20 00:13:00     13\n",
       "14 2017-05-20 00:14:00     14"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "N = 15\n",
    "times = pd.date_range('2017-05-20 00:00', freq='1min', periods=N)\n",
    "df = pd.DataFrame({'time': times, 'value': np.arange(N)})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们用time索引，然后重采样："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:00:00</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:05:00</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:10:00</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     value\n",
       "time                      \n",
       "2017-05-20 00:00:00      5\n",
       "2017-05-20 00:05:00      5\n",
       "2017-05-20 00:10:00      5"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.set_index('time').resample('5min').count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设一个DataFrame包含多个时间序列，用多一个key列来表示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>time</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>a</td>\n",
       "      <td>2017-05-20 00:01:00</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>2017-05-20 00:01:00</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>c</td>\n",
       "      <td>2017-05-20 00:01:00</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>a</td>\n",
       "      <td>2017-05-20 00:02:00</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key                time  value\n",
       "0   a 2017-05-20 00:00:00    0.0\n",
       "1   b 2017-05-20 00:00:00    1.0\n",
       "2   c 2017-05-20 00:00:00    2.0\n",
       "3   a 2017-05-20 00:01:00    3.0\n",
       "4   b 2017-05-20 00:01:00    4.0\n",
       "5   c 2017-05-20 00:01:00    5.0\n",
       "6   a 2017-05-20 00:02:00    6.0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.DataFrame({'time': times.repeat(3),\n",
    "                    'key': np.tile(['a', 'b', 'c'], N), \n",
    "                    'value': np.arange(N * 3.)})\n",
    "df2[:7]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "想要对key列的值做重采样，我们引入pandas.TimeGrouper对象："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "time_key = pd.TimeGrouper('5min')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后设置time为索引，对key和time_key做分组，然后聚合："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key</th>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">a</th>\n",
       "      <th>2017-05-20 00:00:00</th>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:05:00</th>\n",
       "      <td>105.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:10:00</th>\n",
       "      <td>180.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">b</th>\n",
       "      <th>2017-05-20 00:00:00</th>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:05:00</th>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:10:00</th>\n",
       "      <td>185.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">c</th>\n",
       "      <th>2017-05-20 00:00:00</th>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:05:00</th>\n",
       "      <td>115.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-05-20 00:10:00</th>\n",
       "      <td>190.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         value\n",
       "key time                      \n",
       "a   2017-05-20 00:00:00   30.0\n",
       "    2017-05-20 00:05:00  105.0\n",
       "    2017-05-20 00:10:00  180.0\n",
       "b   2017-05-20 00:00:00   35.0\n",
       "    2017-05-20 00:05:00  110.0\n",
       "    2017-05-20 00:10:00  185.0\n",
       "c   2017-05-20 00:00:00   40.0\n",
       "    2017-05-20 00:05:00  115.0\n",
       "    2017-05-20 00:10:00  190.0"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resampled = (df2.set_index('time')\n",
    "             .groupby(['key', time_key])\n",
    "             .sum())\n",
    "resampled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>time</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>2017-05-20 00:05:00</td>\n",
       "      <td>105.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>2017-05-20 00:10:00</td>\n",
       "      <td>180.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>2017-05-20 00:05:00</td>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b</td>\n",
       "      <td>2017-05-20 00:10:00</td>\n",
       "      <td>185.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>2017-05-20 00:00:00</td>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>c</td>\n",
       "      <td>2017-05-20 00:05:00</td>\n",
       "      <td>115.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>c</td>\n",
       "      <td>2017-05-20 00:10:00</td>\n",
       "      <td>190.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key                time  value\n",
       "0   a 2017-05-20 00:00:00   30.0\n",
       "1   a 2017-05-20 00:05:00  105.0\n",
       "2   a 2017-05-20 00:10:00  180.0\n",
       "3   b 2017-05-20 00:00:00   35.0\n",
       "4   b 2017-05-20 00:05:00  110.0\n",
       "5   b 2017-05-20 00:10:00  185.0\n",
       "6   c 2017-05-20 00:00:00   40.0\n",
       "7   c 2017-05-20 00:05:00  115.0\n",
       "8   c 2017-05-20 00:10:00  190.0"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resampled.reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用TimeGrouper的一个限制是时间必须是Series或DataFrame的索引才行。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [py35]",
   "language": "python",
   "name": "Python [py35]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
